85 research outputs found
3D RECONSTRUCTION FROM STEREO/RANGE IMAGES
3D reconstruction from stereo/range image is one of the most fundamental and extensively researched topics in computer vision. Stereo research has recently experienced somewhat of a new era, as a result of publically available performance testing such as the Middlebury data set, which has allowed researchers to compare their algorithms against all the state-of-the-art algorithms. This thesis investigates into the general stereo problems in both the two-view stereo and multi-view stereo scopes. In the two-view stereo scope, we formulate an algorithm for the stereo matching problem with careful handling of disparity, discontinuity and occlusion. The algorithm works with a global matching stereo model based on an energy minimization framework. The experimental results are evaluated on the Middlebury data set, showing that our algorithm is the top performer. A GPU approach of the Hierarchical BP algorithm is then proposed, which provides similar stereo quality to CPU Hierarchical BP while running at real-time speed. A fast-converging BP is also proposed to solve the slow convergence problem of general BP algorithms. Besides two-view stereo, ecient multi-view stereo for large scale urban reconstruction is carefully studied in this thesis. A novel approach for computing depth maps given urban imagery where often large parts of surfaces are weakly textured is presented. Finally, a new post-processing step to enhance the range images in both the both the spatial resolution and depth precision is proposed
Fast Preprocessing for Robust Face Sketch Synthesis
Exemplar-based face sketch synthesis methods usually meet the challenging
problem that input photos are captured in different lighting conditions from
training photos. The critical step causing the failure is the search of similar
patch candidates for an input photo patch. Conventional illumination invariant
patch distances are adopted rather than directly relying on pixel intensity
difference, but they will fail when local contrast within a patch changes. In
this paper, we propose a fast preprocessing method named Bidirectional
Luminance Remapping (BLR), which interactively adjust the lighting of training
and input photos. Our method can be directly integrated into state-of-the-art
exemplar-based methods to improve their robustness with ignorable computational
cost.Comment: IJCAI 2017. Project page:
http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sketch/index.htm
Multimodal Multipart Learning for Action Recognition in Depth Videos
The articulated and complex nature of human actions makes the task of action
recognition difficult. One approach to handle this complexity is dividing it to
the kinetics of body parts and analyzing the actions based on these partial
descriptors. We propose a joint sparse regression based learning method which
utilizes the structured sparsity to model each action as a combination of
multimodal features from a sparse set of body parts. To represent dynamics and
appearance of parts, we employ a heterogeneous set of depth and skeleton based
features. The proper structure of multimodal multipart features are formulated
into the learning framework via the proposed hierarchical mixed norm, to
regularize the structured features of each part and to apply sparsity between
them, in favor of a group feature selection. Our experimental results expose
the effectiveness of the proposed learning method in which it outperforms other
methods in all three tested datasets while saturating one of them by achieving
perfect accuracy
Stylizing Face Images via Multiple Exemplars
We address the problem of transferring the style of a headshot photo to face
images. Existing methods using a single exemplar lead to inaccurate results
when the exemplar does not contain sufficient stylized facial components for a
given photo. In this work, we propose an algorithm to stylize face images using
multiple exemplars containing different subjects in the same style. Patch
correspondences between an input photo and multiple exemplars are established
using a Markov Random Field (MRF), which enables accurate local energy transfer
via Laplacian stacks. As image patches from multiple exemplars are used, the
boundaries of facial components on the target image are inevitably
inconsistent. The artifacts are removed by a post-processing step using an
edge-preserving filter. Experimental results show that the proposed algorithm
consistently produces visually pleasing results.Comment: In CVIU 2017. Project Page:
http://www.cs.cityu.edu.hk/~yibisong/cviu17/index.htm
Exemplar Based Deep Discriminative and Shareable Feature Learning for Scene Image Classification
In order to encode the class correlation and class specific information in
image representation, we propose a new local feature learning approach named
Deep Discriminative and Shareable Feature Learning (DDSFL). DDSFL aims to
hierarchically learn feature transformation filter banks to transform raw pixel
image patches to features. The learned filter banks are expected to: (1) encode
common visual patterns of a flexible number of categories; (2) encode
discriminative information; and (3) hierarchically extract patterns at
different visual levels. Particularly, in each single layer of DDSFL, shareable
filters are jointly learned for classes which share the similar patterns.
Discriminative power of the filters is achieved by enforcing the features from
the same category to be close, while features from different categories to be
far away from each other. Furthermore, we also propose two exemplar selection
methods to iteratively select training data for more efficient and effective
learning. Based on the experimental results, DDSFL can achieve very promising
performance, and it also shows great complementary effect to the
state-of-the-art Caffe features.Comment: Pattern Recognition, Elsevier, 201
Learning to Hallucinate Face Images via Component Generation and Enhancement
We propose a two-stage method for face hallucination. First, we generate
facial components of the input image using CNNs. These components represent the
basic facial structures. Second, we synthesize fine-grained facial structures
from high resolution training images. The details of these structures are
transferred into facial components for enhancement. Therefore, we generate
facial components to approximate ground truth global appearance in the first
stage and enhance them through recovering details in the second stage. The
experiments demonstrate that our method performs favorably against
state-of-the-art methodsComment: IJCAI 2017. Project page:
http://www.cs.cityu.edu.hk/~yibisong/ijcai17_sr/index.htm
Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement
We address the problem of restoring a high-resolution face image from a
blurry low-resolution input. This problem is difficult as super-resolution and
deblurring need to be tackled simultaneously. Moreover, existing algorithms
cannot handle face images well as low-resolution face images do not have much
texture which is especially critical for deblurring. In this paper, we propose
an effective algorithm by utilizing the domain-specific knowledge of human
faces to recover high-quality faces. We first propose a facial component guided
deep Convolutional Neural Network (CNN) to restore a coarse face image, which
is denoted as the base image where the facial component is automatically
generated from the input face image. However, the CNN based method cannot
handle image details well. We further develop a novel exemplar-based detail
enhancement algorithm via facial component matching. Extensive experiments show
that the proposed method outperforms the state-of-the-art algorithms both
quantitatively and qualitatively.Comment: In IJCV 201
- …